Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without fine-tuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner.
translated by 谷歌翻译
从单个图像中创建新颖的视野已通过高级自回归模型取得了长足的进步。尽管最近的方法产生了高质量的新颖观点,但仅使用一个显式或隐式3D几何形状合成在两个目标之间取消了我们称``seeSaw''问题的权衡:1)保留重新注射的内容和2)完成现实的外部 - 视图区域。此外,自回归模型需要相当大的计算成本。在本文中,我们提出了一个单位图视图合成框架,以减轻SEESAW问题。提出的模型是具有隐式和显式渲染器的有效非自动入学模型。由特征的动机,即明确的方法可以很好地保留重新注射的像素和隐式方法完整的现实视图外区域,我们引入了损失函数以补充两个渲染器。我们的损失功能促进了明确的功能改善隐性功能的重新注射区域,而隐式功能改善了显式特征的视角领域。借助提出的架构和损失功能,我们可以减轻SEESAW问题,超过基于自动回归的最先进方法,并生成图像$ \ $ \ $ 100倍。我们通过对RealEstate10K和Acid数据集的实验来验证方法的效率和有效性。
translated by 谷歌翻译
基于CNN的面部识别模型带来了显着的性能改善,但它们容易受到对抗的扰动。最近的研究表明,即使只能访问模型的硬盘标签输出,对手也可以欺骗模型。然而,由于需要许多查询来寻找不可察觉的对抗性噪声,因此减少查询的数量对于这些攻击至关重要。在本文中,我们指出了现有的基于决策黑匣子攻击的两个限制。我们观察到它们浪费查询以进行背景噪声优化,并且他们不利用为其他图像产生的对抗扰动。我们利用3D面部对齐以克服这些限制,并提出了一种关于对地形识别的查询有效的黑匣子攻击的一般策略,名为几何自适应词典攻击(GADA)。我们的核心思想是在UV纹理地图中创造一个对抗扰动,并将其投影到图像中的脸上。通过将扰动搜索空间限制到面部区域并有效地回收之前的扰动来大大提高查询效率。我们将GADA策略应用于两个现有的攻击方法,并在LFW和CPLFW数据集的实验中显示出压倒性的性能改进。此外,我们还提出了一种新的攻击策略,可以规避基于类似性的有状态检测,该检测标识了基于查询的黑盒攻击过程。
translated by 谷歌翻译
光度一致性损失是常用于自我监督单眼深度估计的代表性目标函数之一。然而,由于引导不正确,这种损失往往导致Textureless或遮挡区域中的不稳定深度预测。最近的自我监督学习方法通过利用从自动编码器明确学习的特征表示来解决这个问题,期望比输入图像更好的差异性。尽管使用自动编码的功能,但我们观察到该方法不会将功能嵌入为判别作为自动编码的功能。在本文中,我们提出了剩余的引导损耗,使得深度估计网络能够通过传输自动编码特征的可辨性来嵌入辨别特征。我们对基蒂基准进行了实验,并验证了我们对其他最先进的方法的方法的优势和正交性。
translated by 谷歌翻译
虽然深度神经网络在各种任务中表现出前所未有的性能,但对对抗性示例的脆弱性阻碍了他们在安全关键系统中的部署。许多研究表明,即使在黑盒设置中也可能攻击,其中攻击者无法访问目标模型的内部信息。大多数黑匣子攻击基于查询,每个都可以获得目标模型的输入输出,并且许多研究侧重于减少所需查询的数量。在本文中,我们注意了目标模型的输出完全对应于查询输入的隐含假设。如果将某些随机性引入模型中,它可以打破假设,因此,基于查询的攻击可能在梯度估计和本地搜索中具有巨大的困难,这是其攻击过程的核心。从这种动机来看,我们甚至观察到一个小的添加剂输入噪声可以中和大多数基于查询的攻击和名称这个简单但有效的方法小噪声防御(SND)。我们分析了SND如何防御基于查询的黑匣子攻击,并展示其与CIFAR-10和ImageNet数据集的八种最先进的攻击有效性。即使具有强大的防御能力,SND几乎保持了原始的分类准确性和计算速度。通过在推断下仅添加一行代码,SND很容易适用于预先训练的模型。
translated by 谷歌翻译
We consider the end-to-end abstract-to-title generation problem, exploring seven recent transformer based models (including ChatGPT) fine-tuned on more than 30k abstract-title pairs from NLP and machine learning venues. As an extension, we also consider the harder problem of generating humorous paper titles. For the latter, we compile the first large-scale humor annotated dataset for scientific papers in the NLP/ML domains, comprising almost 2.5k titles. We evaluate all models using human and automatic metrics. Our human evaluation suggests that our best end-to-end system performs similarly to human authors (but arguably slightly worse). Generating funny titles is more difficult, however, and our automatic systems clearly underperform relative to humans and often learn dataset artefacts of humor. Finally, ChatGPT, without any fine-tuning, performs on the level of our best fine-tuned system.
translated by 谷歌翻译
Online ride-hailing services have become a prevalent transportation system across the world. In this paper, we study a challenging problem of how to direct vacant taxis around a city such that supplies and demands can be balanced in online ride-hailing services. We design a new reward scheme that considers multiple performance metrics of online ride-hailing services. We also propose a novel deep reinforcement learning method named Deep-Q-Network with Action Mask (AM-DQN) masking off unnecessary actions in various locations such that agents can learn much faster and more efficiently. We conduct extensive experiments using a city-scale dataset from Chicago. Several popular heuristic and learning methods are also implemented as baselines for comparison. The results of the experiments show that the AM-DQN attains the best performances of all methods with respect to average failure rate, average waiting time for customers, and average idle search time for vacant taxis.
translated by 谷歌翻译
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.
translated by 谷歌翻译
Knowledge graphs, modeling multi-relational data, improve numerous applications such as question answering or graph logical reasoning. Many graph neural networks for such data emerged recently, often outperforming shallow architectures. However, the design of such multi-relational graph neural networks is ad-hoc, driven mainly by intuition and empirical insights. Up to now, their expressivity, their relation to each other, and their (practical) learning performance is poorly understood. Here, we initiate the study of deriving a more principled understanding of multi-relational graph neural networks. Namely, we investigate the limitations in the expressive power of the well-known Relational GCN and Compositional GCN architectures and shed some light on their practical learning performance. By aligning both architectures with a suitable version of the Weisfeiler-Leman test, we establish under which conditions both models have the same expressive power in distinguishing non-isomorphic (multi-relational) graphs or vertices with different structural roles. Further, by leveraging recent progress in designing expressive graph neural networks, we introduce the $k$-RN architecture that provably overcomes the expressiveness limitations of the above two architectures. Empirically, we confirm our theoretical findings in a vertex classification setting over small and large multi-relational graphs.
translated by 谷歌翻译
We present Camelira, a web-based Arabic multi-dialect morphological disambiguation tool that covers four major variants of Arabic: Modern Standard Arabic, Egyptian, Gulf, and Levantine. Camelira offers a user-friendly web interface that allows researchers and language learners to explore various linguistic information, such as part-of-speech, morphological features, and lemmas. Our system also provides an option to automatically choose an appropriate dialect-specific disambiguator based on the prediction of a dialect identification component. Camelira is publicly accessible at http://camelira.camel-lab.com.
translated by 谷歌翻译